UNIVERSITY  OF  ALBERTA  LIBRARY 


0 0003  4991  428 


Provincial  Report 

Grade  6 Language  Arts  ( English) 
Achievement  lest 

October  1984 


Student  Evaluation 


LB 

3054 

C2 

D3634 

1984 


/dibcrra 

EDUCATION 


EDUC 


DISTRIBUTION: 


Superintendents  of  Schools 

School  Principals  and  Teachers 

The  Alberta  Teachers'  Association 

Alberta  School  Trustees'  Association 

Alberta  Education 

General  Public  Upon  Request 


6[x  UBBM 

MBIRtilSaBllS 


EXECUTIVE  SUMMARY 


Description  of  the  Test 

The  Grade  6 Language  Arts  (English)  Achievement  test  was  a two-part  test. 

Part  A:  Expressive  Language  - Writing  required  the  students  to  write  a story 
based  on  a story  starter.  Students  were  given  one  hour  for  writing  Part  A. 
Part  B:  Receptive  Language  - Reading  consisted  of  50  multiple-choice  questions 
linked  to  brief  reading  selections.  Included  in  Part  B were  11  questions  from 
the  Minister's  Advisory  Committee  on  Student  Achievement  (MACOSA)  Test,  1978. 


Administration 

The  test  was  administered  on  June  12,  1984  to  28  812  students  enrolled  in 
Grade  6 English  Language  Arts.  In  all  but  one  jurisdiction  (in  which  only  a 
random  sample  of  schools  was  tested)  the  total  Grade  6 population  was  tested. 
Part  A:  Expressive  Language  - Writing  was  scored  by  104  Grade  6 teachers  under 
the  supervision  of  Student  Evaluation  Branch  personnel.  The  students'  stories 
were  marked  according  to  five  scales:  Content,  Development,  Sentence 

Structure,  Vocabulary,  and  Conventions.  Part  B:  Receptive  Language  - Reading 
was  machine  scored. 


Results 


Part  A:  Expressive  Language  - Writing:  Overall  achievement  on  the  writing 

assignment  was  considered  acceptable,  with  no  fewer  than  77.7%  of  the  students 
scoring  Satisfactory  or  better  on  any  one  marking  scale.  The  highest 
achievement  was  on  Sentence  Structure,  with  82.1%  of  the  students  achieving  a 
Satisfactory  or  better  score;  the  lowest  was  on  Conventions,  with  77.7%  of  the 
students  achieving  a Satisfactory  or  better  score.  Teachers  were  generally 
pleased  with  the  quality  of  the  students'  writing.  The  students  wrote  well  in 
the  narrative  form,  and  usually  enhanced  their  stories  with  creative  and 
interesting  details. 

Part  B:  Receptive  Language  - Reading:  The  provincial  average  score  for  the  50 

multiple-choice  questions  was  33.5  (67.0%)  and  the  Standard  Deviation  was  8.1. 
Eighty-nine  per  cent  of  the  students  scored  23/50  (46%)  or  higher  on  Part  B. 

A greater  percentage  of  the  1984  Grade  6 students  correctly  answered  the  11 
MACOSA  questions  than  did  their  1978  counterparts.  The  average  score  was  68.6 
as  compared  to  the  average  1978  score  of  64.4. 


TABLE  OF  CONTENTS 


PAGE 

ACKNOWLEDGMENTS  vi 

CHAPTER  1:  THE  ACHIEVEMENT  TESTING  PROGRAM  1 

Exemptions  from  the  Achievement  Testing  Program  1 

CHAPTER  2:  TEST  DESIGN,  DEVELOPMENT,  AND  DESCRIPTION  2 

Test  Design  and  Development  2 

Test  Description  3 

Part  A:  Expressive  Language  - Writing  3 

Reporting  Categories  for  Part  A 3 

Part  B:  Receptive  Language  - Reading  5 

Reporting  Categories  for  Part  B 5 

Cognitive  Levels  for  Part  B 5 

CHAPTER  3:  ADMINISTRATION  OF  THE  TEST  7 

Determination  of  the  Student  Population  7 

Administration  7 

Data  Collection  7 

Standard-Setting  7 

CHAPTER  4:  SCORING  OF  PART  A:  EXPRESSIVE  LANGUAGE  - WRITING  9 

Organization  of  Markers  9 

Training  9 

Scoring  9 

Reliability  Reviews  10 

CHAPTER  5:  RESULTS  AND  OBSERVATIONS  11 

Test  Results  11 

Results  for  Part  A:  Expressive  Language  - Writing  11 

Results  for  Part  B:  Receptive  Language  - Reading  12 

Discussion  of  Selected  Questions  17 

Summary  of  Observations  21 

CHAPTER  6:  GUIDE  TO  THE  INTERPRETATION  OF  JURISDICTION  RESULTS  22 

Differences  Between  Jurisdiction  and  Provincial  Averages  22 

Absentee  Rates  25 

APPENDIX  A:  GRADE  6 SCORING  GUIDES:  EXPRESSIVE  LANGUAGE  - WRITING  26 

APPENDIX  B:  GRADE  6 SAMPLE  SCORE  SHEET  31 


IV 


LIST  OF  TABLES 


TABLE 

PAGE 

1 

Grade  6 Language  Arts  (English)  Achievement  Test 
Part  A:  Expressive  Language  - Writing  Blueprint 

4 

2 

Distribution  of  Reading  Selection  Types 

5 

3 

Grade  6 Language  Arts  (English)  Achievement  Test 
Part  B:  Receptive  Language  - Reading  Blueprint 

6 

4 

Part  A:  Expressive  Language  - Writing 
Percentage  Distribution  of  Scores 

11 

5 

Results  for  Part  B:  Receptive  Language  - Reading 
(Reporting  Categories) 

12 

6 

Results  for  Part  B:  Receptive  Language  - Reading 
(Cognitive  Level) 

13 

7 

Part  B:  Receptive  Language  - Reading 
Frequency  Distribution  of  Scores 

14 

8 

Comparison  of  Performance  on  MACOSA  Questions  on 
Part  B:  Receptive  Language  - Reading 

15 

9 

Question  Response  Frequencies 

16 

10 

Distribution  of  Jurisdiction  Levels  of  Achievement 
on  Part  B:  Receptive  Language  - Reading 

24 

V 


ACKNOWLEDGMENTS 


The  successful  administration  of  the  Grade  6 Language  Arts  (English) 
Achievement  Test  was  due  to  the  concerted  effort  of  all  involved.  Success 
would  have  been  impossible  without  substantial  contributions  from  many  people, 
particularly  the  administrators,  teachers,  and  students,  who  extended  their 
full  co-operation. 

The  advice  received  from  the  Test  Reviev;  Committee  regarding  design, 
development  and  reporting  has  been  particularly  valuable  in  the 
implementation  of  the  Achievement  Testing  Program.  This  Committee  has 
representation  from: 


The  Alberta  Teachers'  Association 
The  Conference  of  Alberta  School  Superintendents 
The  Universities 
Alberta  Education 


The  contribution  made  by  this  group  is  gratefully  acknowledged. 

The  technical  expertise  provided  by  Dr.  T.  0.  Maguire,  Professor,  Division  of 
Educational  Research  Services,  University  of  Alberta,  has  also  contributed 
greatly  to  the  advancement  of  the  Achievement  Testing  Program,  and  his  work  in 
this  area  is  acknowledged  and  appreciated. 


Lloyd  E.  Symyrozum 
Director 

Student  Evaluation  Branch 


VI 


Chapter  1 


THE  ACHIEVEMENT  TESTING  PROGRAM 


The  purpose  of  the  Achievement  Testing  Program  is  to  provide  educators, 
trustees,  and  others  with  information,  significant  at  the  provincial  and  local 
levels,  about  student  knowledge,  understanding,  and  skills  in  relation  to 
program  objectives. 

The  achievement  tests  are  specific  to  the  program  of  studies  prescribed  by  the 
Minister  of  Education.  Curriculum  specifications  for  each  subject  area, 
provided  by  the  Curriculum  Branch  and  the  Language  Services  Branch  of  Alberta 
Education,  identify  the  major  content  areas,  the  specific  learning  objectives 
within  each  area,  and  the  emphasis  that  each  objective  is  to  receive.  The 
test  questions  reflect  these  curriculum  specifications. 

The  achievement  tests  are  administered  on  a cyclical  basis  in  four  subject 
areas:  language  arts,  social  studies,  mathematics,  and  science,  and  at  three 

grade  levels:  3,  6,  and  9.  In  1984,  achievement  tests  were  administered  in 

Grade  3 Social  Studies,  Grade  6 Language  Arts,  and  Grade  9 Mathematics. 

Following  the  achievement  test  administration  in  June  of  each  year,  the 
results  are  reported  to  each  school  jurisdiction.  These  district  profiles 
include  results  for  each  school  and  each  student,  but  individual  statements  of 
results  are  not  issued  to  students. 

This  report  is  designed  to  assist  school  jurisdictions  in  interpreting  their 
achievement  test  results. 

Exemptions  from  the  Achievement  Testing  Program 

Under  normal  circumstances,  the  following  students  are  exempt  from  achievement 
testing: 


Students  for  whom  grants  are  received  from  the  Special  Educational 
Services  Branch 

Students  in  classes  in  which  the  subject  being  tested  has  been  cycled 
and  taught  in  an  alternate  year 

Students  in  classes  in  which  the  subject  being  tested  has  been  taught 
in  an  alternate  semester 

Students  enrolled  in  English  as  a Second  Language  programs  for  whom 
grants  are  received  under  Section  54(2)  of  the  School  Grants 
Regulations 


1 


Chapter  2 

TEST  DESIGN,  DEVELOPMENT,  AND  DESCRIPTION 


Test  Design  and  Development 

There  were  a number  of  stages  in  the  development  of  the  Grade  6 Language  Arts 
(English)  Achievement  Test:  preparation  of  curriculum  specifications; 

development  of  test  design  and  questions;  field  testing  and  revision  of 
questions;  construction  and  administration  of  a pilot  test;  and  preparation  of 
the  final  test.  After  key  phases  of  development,  the  test  was  reviewed  by  a 
Test  Review  Committee. 

The  Curriculum  Branch  of  Alberta  Education  prepared  curriculum  specifications 
that  identified  the  major  content  areas,  the  specific  objectives  within  each 
area,  and  the  emphasis  each  is  to  receive  in  the  classroom.  The  curriculum 
specifications  were  distributed  to  all  school  jurisdictions  in  the  province  in 
the  publication  Grade  6 Language  Arts  (English)  Curriculum  Specifications 
(1982).  The  Student  Evaluation  Branch  of  Alberta  Education  selected,  from  the 
prepared  curriculum  specifications,  those  specifications  that  could  best  form 
the  basis  for  a paper-and-pencil  test  and  represent  the  important  curricular 
emphases. 

The  Student  Evaluation  Branch  of  Alberta  Education  developed  a test  blueprint 
and  reporting  categories  based  on  the  selected  curriculum  specifications. 

These  were  then  presented  to  a Test  Review  Committee. 

Test  questions  were  developed  by  Grade  6 Language  Arts  (English)  teachers  from 
all  parts  of  the  province  under  the  supervision  of  the  Student  Evaluation 
Branch.  These  questions  were  field-tested  and  if  necessary,  revised.  Next,  a 
pilot  test  was  constructed,  reviewed  by  the  Test  Review  Committee,  and 
administered.  The  test  design  and  blueprints  as  well  as  the  Grade  6 
Curriculum  Specifications  guided  the  development  of  test  questions. 

The  test  design,  with  blueprints,  scoring  guides,  and  sample  questions,  was 
distributed  to  all  school  jurisdictions  in  the  province  in  the  publication 
Student  Achievement  Testing  Program:  Grade  6 Language  Arts  (Student 
Evaluation  Branch  Bulletin  Volume  3,  Number  10,  November  1983). 

The  final  test  was  constructed  from  those  field  and  pilot  test  questions  that 
best  reflected  curricular  intent  and  test  design. 


- 2 


Test  Description 


The  Grade  6 Language  Arts  (English)  Achievement  Test  consisted  of  two  parts: 

Part  A:  Expressive  Language  - Writing,  and  Part  B:  Receptive  Language  - Reading. 

Part  A:  Expressive  Language  - Writing 

Part  A required  students  to  write  a single  narrative  assignment  based  on  the 
beginning  for  a story.  Students  were  allowed  one  hour  for  writing  Part  A. 

Students  were  not  allowed  to  use  a dictionary  while  writing  Part  A.  Space  was 
provided  for  planning  and  drafting  and  for  revised  work. 

Reporting  Categories  for  Part  A 

To  provide  information  about  student  writing  that  is  meaningful,  students' 
responses  are  examined  in  terms  of  writing  components  that  are  used  as 
reporting  categories.  Factors  evaluated  were:  Content  - the  selecting  of 

details  appropriate  to  purpose  (whether  descriptive  details  associated  with 
character  and  setting,  or  narrative  details  associated  with  actions  or 
events);  Development  - the  organizing  of  details  into  a coherent  whole; 

Sentence  Structure  - the  varied  use  of  sentence  type,  length,  and  structure 
for  effects  such  as  emphasis;  Vocabulary  - the  correct  and  effective  selection 
and  use  of  words  and  expressions;  and  Conventions  - the  correct  and  effective 
use  of  conventions  of  writing  (i.e.,  spelling,  grammar,  punctuation,  and 
capitalization) . 

The  blueprint  for  Part  A follows  on  page  4. 


- 3 - 


Table  1 


Grade  6 Language  Arts  (English)  Achievement  Test 
Part  A;  Expressive  Language  - Writing 
Blueprint 


REPORTING  CATEGORY  DESCRIPTION  OF  RANGE  OF  MARKS 

(Scoring  Guide)  WRITING  ASSIGNMENT 


CONTENT  (Selecting  Details 
Appropriate  to  Purpose) 

Events  should  be  plausible  and 
appropriate  to  the  student's 
purpose  for  communicating.  The 
student  should  be  able  to 
select  appropriate  details  to 
describe  characters  and  setting. 

DEVELOPMENT  (Organizing  Details 
into  a Coherent  Whole) 


In  each  reporting 
category,  students 
receive  a mark 
within  the 
following  range: 

5 - EXCEPTIONAL 
4 - PROFICIENT 
3 - SATISFACTORY 
2 - LIMITED 
1 - POOR 
0 - INSUFFICIENT 

and  Effectively) 

The  student  should  be  able  to 
use  words  and  expressions 
effectively  in  writing. 

CONVENTIONS  (Using  the 
Conventions  of  Language 
Correctly  and  Effectively) 

The  student  should  be  able  to 
communicate  clearly  in  writing 
by  adhering  to  appropriate 
spelling,  grammar,  punctuation, 
and  capitalization. 


The  student  should  be  able  to 
place  events  in  a coherent 
sequence . 

SENTENCE  STRUCTURE  (Structuring 
Sentences  Effectively) 

The  student  should  be  able  to 
use  a variety  of  sentence 
structures  effectively  in 
writing. 

V(X!ABULARY  (Selecting  and  Using 
Words  and  Expressions  Correctly 


The  writing  assignment 
follows  a story  starter 
that  is  to  be  read  by 
students.  The  assignment 
sets  a specific  writing 
task,  but  allows  the 
student  to  use  his 
imagination  to  select 
supporting  details  and 
events  to  include  in  his 
writing . 


4 


Part  B:  Receptive  Language  - Reading 


Part  B consisted  of  43  questions  based  on  10  reading  selections,  and  7 
discrete  questions  re-administered  from  the  Minister's  Advisory  Committee  on 
Student  Achievement  Test  (1978).  Information  on  the  numbers  and  types  of 
reading  selections  in  Part  B follows  in  Table  2. 

Table  2 

Distribution  of  Reading  Selection  Types 


Reading  Selection 
Type 

Number  of  Selections 

Number  of 
Questions 

Discrete  Questions 

- 

7 

Fiction 

5 

21 

Mon-fiction 

4 

17 

Poetry 

1 

5 

Total 

10 

50 

Students  were  allowed  one  hour  for  writing  Part  B.  Use  of  a dictionary  was 
not  permitted. 

Reporting  Categories  for  Part  B 

Questions  were  grouped  into  five  reporting  categories  or  subtests:  Main  Idea 

(9  questions).  Supporting  Detail  (9  questions).  Vocabulary  (16  questions). 
Relationships  (7  questions),  and  Conclusions  (9  questions).  Each  reporting 
category  required  a minimum  of  six  questions  so  that  reliable  statistics  could 
be  obtained.  The  reporting  categories  used  in  Part  B are  given  in  Table  3, 
page  6 . 

Cognitive  Levels  for  Part  B 

A further  design  consideration  affecting  the  development  of  Part  B was  that  of 
cognitive  level.  Questions  were  classified  according  to  three  cognitive 
levels:  Literal  Understanding  (6  questions).  Inferential  Understanding  (32 

questions),  and  Evaluation  (12  questions).  By  considering  cognitive  level 
when  developing  a test,  the  Student  Evaluation  Branch  attempts  to  ensure  that 
a variety  of  mental  activities  will  be  used  by  students  as  they  write  the 
test.  Questions  listed  under  Literal  Understanding  are  expected  to  be 
answered  using  skills  of  recall  and  recognition;  those  listed  under 
Inferential  Understanding  are  expected  to  elicit  skills  of  analysis, 
interpretation,  and  extrapolation;  and  questions  listed  under  Evaluation  are 
expected  to  draw  forth  judgmental  skills. 

The  classification  of  the  questions  for  each  reporting  category  for  each 
cognitive  level  is  shown  in  Table  3. 


5 


Table  3 


Grade  6 Language  Arts  (English)  Achievement  Test 
Part  B:  Receptive  Language  - Reading 


Actual  Test  Blueprint 


REPORTING 

COGNITIVE  LEVEL 

TOTAL 

LITERAL 

INFERENTIAL 

CATEGORY 

UNDERSTANDING 

UNDERSTANDING 

EVALUATION 

1. 

MAIM  IDEA 

The  student  should  be  able 

(8) 

(1) 

9 

to  determine  the  main  idea 

9,14,19,29, 

40 

of  a reading  selection. 

34,39,45,50 

2. 

SUPPORTING  DETAIL 

The  student  should  be  able 

(1) 

(5) 

(3) 

9 

to  understand  supporting 

7 

21,24,25, 

11,20 

details  found  in  reading 
selections  and  evaluate 
supporting  details  in 
terms  of  the  main  idea. 

30,35 

43 

3. 

VOCABULARY 

The  student  should  be  able 

(3) 

(10) 

(3) 

16 

to  recall  the  meanings  of 

1,2,4 

3,10,15,22, 

6,26,37 

words  and  expressions. 

23,32,41,47, 

infer  word  meaning  from 
context,  and  evaluate 
appropriateness  of  words 
used. 

48,49 

4. 

RELATIONSHIPS  (Cause  and 
Effect ) 

The  student  should  be  able 

(2) 

(5) 

7 

to  determine,  through 

5,42 

12,16,27, 

recall  or  inference,  the 
cause  of  a stated  effect. 

31,36 

5. 

CONCLUSIONS 

The  student  should  be  able 

(4) 

(5) 

9 

to  draw  appropriate 

18,33,44, 

8,13,17, 

conclusions  from  details 
and  ideas  present  in 
reading  selections,  and 
evaluate  the  relative 
importance  of  concluding 
statements . 

46 

28,38 

TOTAL 

6 

32 

12 

50 

6 


Chapter  3 


ADMINISTRATION  OF  THE  TEST 


Determination  of  the  Student  Population 

The  larger  school  jurisdictions  could  choose  to  test  either  all  Grade  6 
Language  Arts  (English)  students  or  students  from  randomly  selected  schools. 
School  boards  were  required  to  notify  the  Student  Evaluation  Branch  of  their 
wish  to  have  student  achievement  sampled  in  their  jurisdiction.  Only  one 
jurisdiction  opted  for  sampling. 

Administration 

Jurisdictions  were  requested  in  April  to  report  the  number  of  students 
enrolled  in  Grade  6 Language  Arts  (English)  in  each  school.  In  May,  letters 
were  sent  by  the  Student  Evaluation  Branch  to  superintendents,  principals,  and 
teachers  in  the  province  requesting  their  co-operation  in  the  testing. 
Information  addressed  to  the  superintendents  and  principals  included  the  test 
schedule,  procedures  for  test  administration,  and  requirements  for  returning 
test  materials.  Information  addressed  to  the  teachers  related  to  the 
administration  of  the  test  and  the  return  of  test  materials.  Each 
jurisdiction  was  sent  the  appropriate  number  of  tests  and  administration 
instructions,  packaged  according  to  school.  After  the  test  was  administered, 
teachers  were  instructed  to  collect  all  test  booklets  and  answer  sheets  and 
return  them  to  the  principal  for  forwarding  to  school  board  offices,  which,  in 
turn,  were  responsible  for  sending  the  test  booklets  and  answer  sheets  to  the 
Student  Evaluation  Branch. 

Staff  from  the  Regional  offices  of  Education  supervised  the  administration  of 
the  test  in  private  schools. 

Data  Collection 


A total  of  1001  schools  from  145  public  and  separate  school  jurisdictions 
returned  scorable  booklets  for  28  077  students.  A total  of  77  schools  from 
private  jurisdictions  returned  scorable  booklets  for  735  students. 

Standard-Setting  for  Part  B:  Expressive  Language  - Writing 

While  provincial  averages  are  useful  for  comparing  the  scores  of  students  in  a 
particular  school  or  jurisdiction  with  overall  levels  of  achievement,  it  is 
not  possible  to  know  whether  the  students  in  the  province  did  as  well  as  they 
should.  A test  score  by  itself  has  limited  meaning  without  comparison  to  a 
standard.  Tests  vary  in  difficulty:  a raw  score  of  25/50,  for  example,  could 
represent  very  high  achievement  on  one  test,  and  very  low  achievement  on 
another. 


- 7 - 


To  establish  a standard  that  allows  the  assessment  of  overall  achievement  on 
the  test,  the  Student  Evaluation  Branch  follows  certain  procedures.  For  the 
Grade  6 Language  Arts  (English)  test,  experienced  Grade  6 teachers  from  all 
parts  of  the  province  met  to  determine  what  raw  score  would  be  expected  on  the 
test  for  a borderline  student.  The  borderline  is  the  division  between  those 
who  could  be  expected  to  achieve  the  minimum  objectives,  and  those  who  could 
not.  After  a review  of  the  curriculum,  it  was  judged  that  80%  of  the  Grade  6 
students  should  be  able  to  achieve  the  minimum  objectives  of  the  Grade  6 
Language  Arts  (English)  curriculum,  as  reflected  by  the  achievement  test, 
given  adequate  teaching  and  resources.  Since  80%  of  the  students  should  be 
able  to  reach  this  level,  the  borderline  student  would  be  at  the  20th 
percentile  in  ability. 

The  teachers  examined  each  question  on  the  test  and  determined  the  difficulty 
of  that  question  for  a 20th  percentile  student.  From  the  individual  question 
difficulties,  the  overall  test  difficulty  for  the  borderline  student  was 
determined.  The  average  of  the  test  difficulties  established  by  the  teachers 
is  the  standard  for  the  test.  For  the  Grade  6 Language  Arts  (English) 
Achievement  Test  the  standard  established  was  as  follows:  Given  the  nature 

and  difficulty  of  this  test,  80%  of  the  students  should  achieve  a score  of 
23/50  or  better. 


- 8 - 


Chapter  4 


SCORING  OF  PART  A:  EXPRESSIVE  LANGUAGE  - WRITING 


Organization  of  Markers 

A marking  centre  was  established  in  Edmonton  at  the  Legislature  Annex.  One 
hundred  and  four  teachers  from  across  the  province  scored  Part  A:  Expressive 
Language  - Writing  from  July  23  to  July  27,  1984.  To  qualify  for  marking, 
each  teacher  was  required  to  have  a valid  permanent  Alberta  teaching 
certificate,  to  have  taught  Grade  6 Language  Arts  (English)  for  at  least  two 
years,  and  to  be  currently  teaching  Grade  6 Language  Arts  (English).  In 
addition  markers  were  required  to  have  been  recon)mended  by  their 
superintendents . 

Twenty-two  teachers  from  different  parts  of  the  province  were  appointed  group 
leaders.  They  met  with  Student  Evaluation  Branch  personnel  on  Friday,  July 
20,  1984  to  help  prepare  for  the  marking  session.  This  one-day  session  for 
group  leaders  consisted  of  reading,  scoring,  and  discussing  papers  that  were 
generally  representative  of  the  range  of  student  writing  apparent  in  the 
actual  samples.  The  principal  focus  of  the  group  leaders'  discussion  was  the 
appropriateness  of  the  scoring  guides  and  their  application  to  the  students' 
writing.  In  essence,  this  group  validated  the  standard  for  assessing  Grade  6 
students'  writing. 

Training 

On  Monday,  July  22,  the  104  markers  met  at  the  Legislature  Annex.  The  first 
morning  was  used  for  training.  Markers  were  divided  into  three  groups,  each 
supervised  by  a member  of  the  Student  Evaluation  Branch.  Markers  reviewed  the 
scoring  guides  and  procedures  in  detail.  They  then  moved  into  22  small  groups 
to  score  and  discuss  student  papers  selected  to  exemplify  the  scoring 
criteria.  The  group  leaders  then  led  the  small-group  discussions  about  the 
interpretation  and  application  of  the  scoring  guides. 

Scoring 

The  remainder  of  the  week  was  used  for  the  independent  scoring  of  student 
papers.  Each  paper  was  scored  independently  by  one  marker.  The  one-marker 
system  produces  results  that  are  reliable  at  the  school  and  jurisdiction  level 
but  not  necessarily  at  the  individual  student  level. 

Before  the  papers  were  distributed  to  the  markers,  student  identification  was 
removed  and  the  papers  were  organized  into  bundles  of  12.  Each  marker 
collected  a bundle  of  papers  from  a table  and  entered  his  or  her  ID  number  on 
the  back  of  each  paper.  The  papers  were  then  read  and  scored  independently. 
Scored  papers  were  then  rebundled  and  taken  to  another  table  where  score 
sheets  were  checked  to  see  that  they  had  been  completed  correctly.  Score 
sheets  were  removed  and  processed  for  statistical  analysis  and  reporting. 


9 


Although  the  papers  were  scored  on  a one-marker  system,  597  papers  were 
recirculated  so  that  for  these  papers  a second  set  of  scores  would  be 
available  to  confirm  scoring  consistency.  Of  the  scores  awarded  to  the  597 
papers  on  a second  reading,  91.0%  remained  identical  to  the  original  score,  or 
varied  by  only  one  score,  on  each  of  the  five  scoring  scales. 


Reliability  Reviews 

Reliability  of  results  was  of  prime  concern  during  the  scoring  sessions,  and 
because  of  this,  reliability  review  sessions  were  scheduled  twice  daily  at 
10:00  a.m.  and  at  2:00  p.m  (except  for  Monday  and  Friday,  when  only  one  review 
occurred) . At  these  sessions  each  marker  was  given  a copy  of  the  same  paper 
to  score  independently  in  each  of  the  five  categories.  Group  discussion  of 
the  scores  assigned  by  each  marker  followed.  Each  marker  was  given  the 
opportunity  to  enter  a second  score  in  each  category  and  the  group  leaders 
forwarded  the  sets  of  scores  to  Student  Evaluation  personnel.  The  pre-  and 
post-discussion  scores  were  tallied  and  the  resultant  distribution  of  scores 
for  each  session  was  posted.  This  information  provided  useful  feedback  for 
the  markers  and  helped  to  ensure  greater  consistency  in  the  application  of  the 
scoring  guides. 

Group  membership  was  changed  at  regular  intervals  during  the  reliability 
review  sessions. 

On  two  occasions,  a group  of  20  markers  participated  in  standard-setting  for 
Part  B:  Receptive  Language  - Reading. 


10 


Chapter  5 


RESULTS  AND  OBSERVATIONS 

Test  Results 

The  results  of  the  test  are  reported  separately  for  Part  A and  Part  B. 
Results  for  Part  A:  Expressive  Language  - Writing 
Table  4 shows  the  results  for  Part  A. 


Table  4 

Part  A:  Expressive  Language  - Writing 
Percentage  Distribution  of  Scores 


Score 

Reporting  Category 

Content 

Development 

Sentence 

Structure 

Vocabulary 

Conventions 

5 (Exceptional) 

9.2 

8.7 

9.3 

8.5 

11.4 

4 (Proficient) 

25.3 

26.1 

26.4 

22.1 

28.5 

3 (Satisfactory) 

45.0 

43.6 

46.4 

50.6 

37.8 

2 (Limited) 

18.2 

20.0 

14.9 

17.0 

17.7 

1 (Poor) 

2.1 

1.4 

2.8 

1.6 

4.4 

0 (Insufficient) 

0.2 

0.2 

0.2 

0.2 

0.2 

Observations  regarding  the  results  for  Part  A are  as  follows:  for  Content, 

79.5%  of  the  students  scored  at  a Satisfactory  level  or  better,  and  20.3%  of 
the  students  scored  at  a Limited  or  Poor  level;  for  Development,  78.4%  of  the 
students  scored  Satisfactory  or  better,  and  21.4%  scored  Limited  or  Poor;  for 
Sentence  Structure,  82.1%  of  the  students  scored  Satisfactory  or  better,  and 
17.7%  scored  Limited  or  Poor;  for  Vocabulary,  81.2%  of  the  students  scored 
Satisfactory  or  better,  and  18.6%  scored  Limited  or  Poor;  for  Conventions, 
77.7%  of  the  students  scored  Satisfactory  or  better,  and  22.1%  scored  Limited 
or  Poor.  Only  0.2%  of  the  students  produced  written  work  that  was  considered 
to  be  insufficient  for  scoring  purposes.  Teacher-markers  were  generally 
pleased  with  the  quality  of  the  students'  writing.  Students  handled  the 
narrative  form  very  well,  and  made  their  stories  interesting  through  creative 
selection  of  details.  The  students  often  created  characters  with  quite 
distinctive  personalities.  On  many  papers  rough  and  revised  drafts  showed 
evidence  of  editing.  Such  efforts  were  recognized  by  markers  in  scoring  these 
papers . 


11 


In  summary,  overall  achievement  on  the  writing  assignment  was  considered 
acceptable,  with  no  fewer  than  11.1%  of  the  students  scoring  Satisfactory  or 
better  on  any  one  marking  scale.  The  highest  achievement  was  on  Sentence 
Structure  and  the  lowest  v/as  on  Conventions. 

Results  for  Part  B:  Receptive  Language  - Reading 

The  average  score  on  Part  B was  33.5/50  (67.0%). 

Tables  5-9  give  details  about  scores  according  to  reporting  category, 
cognitive  level,  frequency  distribution,  performance  on  MACOSA  tests,  and 
frequency  of  response  question  by  question. 

Table  5 

Results  for  Part  B:  Receptive  Language  -■  Reading 
(Reporting  Categories) 


Reporting  Category 

Number  of 
Questions 

Raw  Score 
Mean 

Standard 

Deviation 

Total  Test 

50 

33.5 

8.1 

1. 

Main  Idea 

9 

6.1 

2.0 

2. 

Supporting  Detail 

9 

6.5 

1.8 

3. 

Vocabulary 

16 

10.3 

2.7 

4. 

Relationships 

7 

4.9 

1.5 

5. 

Conclusions 

9 

5.8 

2.0 

Observations  regarding  the  results  for  Part  B are  as  follows:  for  Main  Idea, 

the  average  score  was  6.1/9  (67.8%);  for  Supporting  Detail,  the  average  score 
was  6.5/9  (72.2%);  for  Vocabulary,  the  average  score  was  10.3/16  (64.4%);  for 
Relationships,  the  average  score  was  4.9/7  (70.0%);  and  for  Conclusions,  the 
average  score  was  5.8/9  (64.4%). 

Although  performance  in  the  different  reporting  categories  appears  to  show 
some  variation,  caution  is  advised  in  comparing  them.  The  sets  of  questions 
that  make  up  each  reporting  category  were  not  selected  to  be  equal  in  average 
level  of  difficulty,  therefore  differences  may  be  due  to  variations  in 
question  difficulty  rather  than  in  student  performance.  The  averages  can  be 
used,  however,  in  combination  with  jurisdictional  results  to  detect  patterns 
of  relative  strength  or  weakness  in  achievement  in  each  of  the  reporting 
categories . 


12 


Table  6 shows  raw  scores  for  Part  B by  cognitive  level. 


Table  6 

Results  for  Part  B:  Receptive  Language  - Reading 
( Cogni t i ve  Le ve 1 ) 


Cognitive  Level 

Number  of 
Questions 

Raw  Score 
Mean 

Standard 

Deviation 

Total  Test 

50 

33.5 

8.1 

Literal  Understanding 

6 

4.0 

1.3 

Inferential  Understanding 

32 

21.3 

5.6 

Evaluation 

12 

8.2 

2.3 

Observations  regarding  the  results  for  Part  B by  cognitive  level  are  as 
follows:  for  Literal  Understanding,  students  scored  an  average  of  4.0/6 

(66.7%);  for  Inferential  Understanding,  students  scored  an  average  of  21.3/32 
(66.6%);  and  for  Evaluation,  students  scored  an  average  of  8.2/12  (68.3%). 

Because  questions  within  each  cognitive  level  vary  in  difficulty,  and  because 
the  average  difficulty  of  the  questions  in  one  cognitive  level  is  not 
necessarily  the  same  as  the  average  difficulty  of  questions  in  another 
cognitive  level,  no  conclusions  can  be  drawn  about  students'  performance  on 
one  cognitive  level  compared  to  performance  on  another. 


13 


Table  7 shows  the  frequency  distribution  of  scores  on  Part  B. 

Table  7 

Part  B:  Receptive  Language  - Reading 
Frequency  Distribution  of  Scores 


Score 

Relative 
Frequency 
in  %* 

Cumulative 
Frequency 
in  %** 

Score 

Relative 
Frequency 
in  %* 

Cumulative 
Frequency 
in  %** 

1 

_ 

- 

26 

2.4 

19.8 

2 

- 

- 

27 

2.5 

22.3 

3 

- 

- 

28 

2.9 

25.2 

4 

- 

- 

29 

3.1 

28.3 

5 

- 

- 

30 

3.5 

31.8 

6 

- 

0.1 

31 

3.5 

35.3 

7 

- 

0.1 

32 

4.0 

39.3 

8 

0.1 

0.1 

33 

4.4 

43.7 

9 

0.1 

0.3 

34 

4.2 

47.9 

10 

0.1 

0.4 

35 

4.8 

52.7 

11 

0.2 

0.6 

36 

5.0 

57.7 

12 

0.3 

1.0 

37 

5.3 

62.9 

13 

0.5 

1.4 

38 

5.4 

68.4 

14 

0.6 

2.0 

39 

5.5 

73.9 

15 

0.7 

2.7 

40 

5.2 

79.1 

16 

0.7 

3.4 

41 

5.1 

84.2 

17 

0.9 

4.3 

42 

4.5 

88.7 

18 

1.2 

5.5 

43 

3.8 

92.4 

19 

1.2 

6.7 

44 

2.9 

95.4 

20 

1.3 

8.0 

45 

2.0 

97.4 

21 

1.6 

9.6 

46 

1.4 

98.8 

22 

1.7 

11.3 

47 

0.7 

99.5 

23 

1.9 

13.2 

48 

0.4 

99.9 

24 

2.1 

15.2 

49 

0.1 

100.0 

25 

2.2 

17.4 

50 

- 

100.0 

Any  score  that  was  achieved  by  fewer  than  0.05%  of  the  population  is 
represented  by  a dash  (-).  It  should  be  noted,  therefore,  that  the  range  of 
student  scores  was  from  1 to  50,  although  the  relative  frequencies  at  the  top 
and  lower  ends  of  the  distribution  do  not  appear  to  indicate  this.  Two 
students  achieved  a raw  score  of  50  but,  since  this  represents  fewer  than 
0.05%  of  the  population,  the  relative  frequency  is  considered  to  be  0.0. 

The  standard  set  for  achievement  of  the  minimum  objectives  of  the  Grade  6 
Language  Arts  (English)  Achievement  Test  was  met  by  88.7%  of  the  students. 

The  procedure  used  for  establishing  this  standard  is  described  in  detail  on 
pages  7 and  8. 


*Relative  Frequency:  the  percentage  of  students  achieving  each  score. 

**Cumulative  Frequency:  the  percentage  of  students  achieving  at,  or  below, 
each  score. 


14 


Table  8 shows  the  percentages  of  students  choosing  correct  answers  for  11 
MACOSA  questions  in  1978  and  in  1984. 

Table  8 

Comparison  of  Performance  on  lylACOSA*  Questions 
Part  B:  Receptive  Language  - Reading 


Question 

Number 

1978 

Percentage  of  Students 
Choosing  Correct  Answers 

1984 

Percentage  of  Students 
Choosing  Correct  Answers 

1. 

43.6 

49.7 

2. 

67.9 

74.6 

3. 

83.0 

89.8 

4. 

30.4 

35.8 

40. 

72.2 

81.4 

43. 

62.5 

71.9 

45. 

63.2 

63.3 

47. 

75.5 

72.0 

48. 

72.1 

74.0 

49. 

66.5 

70.0 

50. 

71.0 

72.0 

Mean  64.4 

Mean  68.6 

N = 1597 

N = 28  077 

The  following  observations  may  be  made  regarding  the  scores  for  these  11 
questions:  The  differences  in  the  percentages  of  students  choosing  correct 

answers  on  the  11  MACOSA  questions  are  statistically  significant  at  the  .001 
level.  A greater  percentage  of  students  chose  correct  answers  in  1984  than 
did  Grade  6 students  in  1978. 


*Minister's  Advisory  Committee  on  Student  Achievement  Test  (1978). 


Table  9 shows  question  response  frequencies  for  all  50  questions  appearing  in 
Part  B. 

Table  9 

Question  Response  Frequencies 


Question 

Number 

Key 

Distribution  of 
Responses  in 

Question 

Number 

Key 

Distribution  of 
Responses  in 

A 

B 

C 

D 

A 

B 

C 

D 

1. 

D 

7 

3 

40 

50 

26. 

B 

39 

55 

4 

2 

2. 

C 

4 

16 

75 

5 

27. 

D 

7 

3 

9 

80 

3. 

D 

4 

3 

3 

90 

28. 

A 

65 

9 

22 

3 

4. 

B 

29 

36 

7 

28 

29. 

D 

15 

25 

8 

51 

5. 

B 

21 

76 

1 

2 

30. 

D 

3 

15 

14 

68 

6. 

B 

6 

75 

9 

11 

31. 

B 

21 

58 

12 

9 

7. 

A 

90 

3 

4 

3 

32. 

A 

39 

14 

41 

7 

8. 

A 

70 

6 

5 

18 

33. 

A 

56 

27 

12 

6 

9. 

C 

12 

4 

74 

10 

34. 

B 

14 

59 

18 

9 

10. 

A 

66 

12 

9 

13 

35. 

D 

7 

3 

9 

82 

11. 

B 

9 

62 

13 

16 

36. 

C 

14 

18 

60 

7 

12. 

D 

3 

12 

11 

74 

37. 

A 

85 

6 

6 

3 

13. 

B 

5 

71 

17 

7 

38. 

D 

7 

3 

10 

79 

14. 

C 

8 

22 

50 

21 

39. 

B 

7 

73 

5 

14 

15. 

A 

50 

12 

17 

20 

40. 

A 

81 

7 

9 

3 

16. 

C 

3 

18 

70 

9 

41. 

C 

6 

33 

58 

4 

17. 

C 

9 

5 

54 

32 

42. 

A 

74 

5 

3 

18 

18. 

D 

17 

29 

10 

44 

43. 

C 

3 

12 

72 

13 

19. 

A 

84 

4 

7 

6 

44. 

B 

10 

67 

8 

15 

20. 

D 

14 

12 

20 

54 

45. 

C 

21 

9 

63 

6 

21. 

B 

8 

70 

14 

8 

46. 

A 

74 

12 

7 

7 

22. 

A 

80 

12 

5 

3 

47. 

C 

2 

21 

72 

4 

23. 

A 

53 

7 

23 

17 

00 

D 

6 

7 

13 

74 

24. 

C 

4 

12 

82 

2 

49. 

B 

18 

70 

4 

8 

25. 

B 

28 

69 

2 

1 

50. 

A 

72 

9 

4 

15 

*The  sum  of  the  percentages  may  not  be  100  because  the  numbers  have  been 
rounded. 


16 


Discussion  of  Selected  Questions 

The  results  for  each  reporting  category  are  discussed  in  detail  in  the 
following  sections.  Those  skills  that  were  tested  in  each  category  are 
identified  and  the  easiest  and  most  difficult  questions  within  each  category 
are  noted.  Sample  questions  from  the  test  are  provided.  For  each  sample 
question,  the  asterisk(*)  indicates  the  correct  response,  and  the  percentage 
of  students  who  selected  each  alternative  is  given. 

Mam  Idea  (Questions  9,  14,  19,  29,  34,  39,  40,  45,  and  50) 

Questions  related  to  this  reporting  category  measure  ability  to: 

•determine  the  main  idea  of  a reading  selection 

•infer  the  main  idea  of  a reading  selection  by  using  contextual  clues 
•evaluate  the  appropriateness  of  suggested  main  ideas  for  a reading 
selection 

The  average  raw  score  for  the  9 questions  on  Main  Idea  was  6.1. 

Question  19,  requiring  students  to  infer  the  main  idea  of  a report  by  using 
contextual  clues,  was  found  to  be  the  easiest  (84^  answered  correctly). 

Question  14,  requiring  students  to  infer  the  main  idea  of  a story  by  using 
contextual  clues,  was  found  to  be  the  most  difficult  (50%  answered  correctly). 

A discussion  of  question  14  follows. 


14.  Which  title  suggests  the  main  idea  of  this  story? 


It  would  seem  that  students  who  selected  incorrect  answers  were  confused  by 
the  amount  of  description  about  Grey  Wolf  and  his  preparations,  and  did  not 
realize  that  "main  idea"  is  a general  idea  that  is  drawn  from  the  specific 
details,  not  a topic  sentence  or  a specific  detail.  This  is  a challenging 
concept  for  students,  and  it  appears  that  about  half  of  the  students 
experienced  difficulty  with  question  14. 

Supporting  Detail  (Questions  7,  11,  20,  21,  24,  25,  30,  35,  and  43) 

Questions  related  to  this  reporting  category  measure  ability  to: 

•recall  supporting  detail  found  in  a reading  selection 
•infer  supporting  detail  found  in  a reading  selection 
•evaluate  supporting  detail  in  terms  of  the  main  idea 

The  average  raw  score  for  the  9 questions  on  Supporting  Detail  was  6.5. 


Student  Responses 


A.  Indian  Dances 

B.  Grey  Wolf  the  Dancer 
*C.  The  Power  Within 

D.  The  Great  Spirit  Watches 


17 


Question  7,  requiring  students  to  recall  a supporting  detail  found  in  a 
reading  selection,  was  found  to  be  the  easiest  (90%  answered  correctly). 

Question  30,  requiring  students  to  evaluate  supporting  details,  was  found  to 
be  the  most  difficult  (68%  answered  correctly). 

A discussion  of  question  35,  a relatively  easy  question,  follov/s. 

35.  What  type  of  boat  did  the  fishermen  use? 


This  question  was  answered  easily  by  students,  possibly  because  the  word 
"rowing"  is  mentioned  twice  in  the  story.  Students  may  have  used  prior 
knowledge  about  boats  along  with  details  from  the  story  in  choosing  the 
correct  answer.  In  order  to  confirm  their  answer,  however,  students  still  had 
to  refer  to  the  details  in  lines  9 and  23. 

Vocabulary  (Questions  1,  2,  3,  4,  6,  10,  15,  22,  23,  26,  32,  37,  41,  47,  48, 
and  49) 

Questions  related  to  this  reporting  category  measure  ability  to: 

•recall  the  meanings  of  words  and  expressions 

•infer  word  meaning  from  context 

•evaluate  the  appropriateness  of  word  usage 

The  average  raw  score  for  the  16  questions  on  Vocabulary  was  10.3. 

Question  3,  requiring  students  to  infer  a word  meaning  from  context,  was  found 
to  be  the  easiest  (90%  answered  correctly). 

Question  4,  requiring  students  to  recall  the  meaning  of  a word,  was  found  to 
be  the  most  difficult  (36%  answered  correctly). 

A discussion  of  question  26,  a question  of  slightly  greater  than  average 
difficulty,  follows. 


26.  Which  of  the  following  words  would  BEST  replace  "attempt"  (line  6)? 


Student  Responses 


A.  A driftboat 

B.  A sailboat 

C.  A motorboat 
*D.  A rowboat 


7% 


Student  Responses 


A.  Try 
*B.  Risk 


C.  Manage 

D.  Take 


18 


As  with  all  evaluation  questions,  the  four  alternatives  in  question  26  have  a 
measure  of  correctness.  Such  questions  require  the  students  to  select  the 
BEST  alternative.  Students  v/ho  incorrectly  chose  alternative  A may  have  done 
so  because  of  its  position  (first)  and  its  plausibility.  It  is  likely  that 
those  students  did  not  re-read  the  whole  of  the  sentence  concerned  in  order  to 
confirm  the  context.  The  context  of  the  sentence  in  question,  and  of  the 
selection  as  a v/hole,  indicates  that  "risk"  (alternative  B)  is  the  MOST 
appropriate  synonym  for  "attempt." 

Question  32,  a question  that  confused  some  students,  is  discussed  below. 


32.  What  does  the  word  "inevitable"  (line  14)  mean? 


Student  Responses 

*A. 

Certain 

39% 

B. 

Timely 

14% 

C. 

Rapid 

41% 

D. 

Repeated 

7% 

Question  32  could  have  been  answered  with  reference  to  contextual  clues  in 
lines  14  and  15.  The  word  "awaiting"  and  the  phrase  "once  and  for  all" 
suggest  finality  rather  than  speed. 

Danny  shrank  against  a tree,  awaiting  the  inevitable  charge.  Old 
Majesty  was  about  to  settle  once  and  for  all  their  long-standing 
feud. 

More  students  chose  "rapid"  (alternative  C)  as  the  answer  than  chose  "certain" 
(alternative  C) , which  is  the  correct  answer.  Students  may  have  been  drawn  to 
C because  throughout  the  selection  events  happen  quickly.  Some  students  may 
not  have  understood  the  meaning  of  "certain"  in  this  context.  Those  students 
who  did  well  on  the  test  as  a whole,  however,  chose  A. 

Relationships  (Questions  5,  12,  16,  21 , 31,  36  and  42) 

Questions  related  to  this  reporting  category  measure  ability  to: 

• determine,  through  recall,  the  cause  of  a stated  effect 

• determine,  through  inference,  the  cause  of  a stated  effect 

The  average  raw  score  for  the  7 questions  on  Relationships  was  4.9. 

Question  27,  requiring  students  to  infer  the  cause  of  a stated  effect,  was 
found  to  be  the  easiest  iSOZ  answered  correctly). 

Question  31,  also  requiring  students  to  infer  the  cause  of  a stated  effect, 
was  found  to  be  the  most  difficult  (58%  answered  correctly). 


19 


A discussion  of  question  31  follows. 


3 1 . How 

did  the  bull  die? 

Student  Responses 

A. 

He  fell  off  a cliff. 

21% 

*B. 

He  was  killed  by  Old  Majesty. 

58% 

C. 

He  was  shot  by  Danny. 

12% 

D. 

He  was  killed  by  Red. 

9% 

In  a question  such  as  31,  students  are  required  to  determine  the  cause  of  a 
stated  effect.  The  fact  that  the  bull  was  dead  was  mentioned  twice  in  the 
story:  "carcass"  and  "his  kill."  Over  half  of  the  students  were  able  to 
correctly  associate  the  two  references  to  the  dead  bull  with  the  killer. 

Others  may  have  been  confused  by  the  narration  of  the  story  and  been  attracted 
to  the  first  alternative. 

Conclusions  (Questions  8,  13,  17 , 18,  28,  33,  38,  44,  and  46) 

Questions  related  to  this  reporting  category  measure  ability  to: 

• draw  (infer)  appropriate  conclusions  from  details  and  ideas  present  in 
a reading  selection 

• evaluate  the  relative  importance  of  concluding  statements 

The  average  raw  score  for  the  9 questions  on  Conclusions  was  5.8. 

Question  46,  requiring  students  to  infer  appropriate  conclusions,  was  found  to 
be  the  easiest  (14%  answered  correctly). 

Question  18,  also  requiring  students  to  infer  appropriate  conclusions,  was 
found  to  be  the  most  difficult  (44%  answered  correctly). 

A discussion  of  question  18  follows. 


18.  Considering  the  details  in  the  report. 

which  conclusion  is  correct? 

A. 

Gravity  flow  irrigation  is  the 

Student  Responses 

easiest  to  operate. 

17% 

B. 

Sprinkler  system  irrigation 

requires  inexpensive  machinery. 

29% 

C. 

Soil  in  orchards  must  be  plowed 

each  year. 

10% 

*D. 

Gravity  flow  irrigation  can 

damage  the  soil. 

44% 

Question  18  directs  students  to  select  an  answer  that  is  a correct  conclusion, 
and  to  do  so  in  terms  of  "the  details  in  the  report."  Students  who  did  not 
notice  the  word  "conclusion"  in  the  question  may  have  selected  incorrect 
answers  that  are  details  from  the  selection.  Re-reading  would  have  helped 
students  to  verify  their  selected  answers. 


- 20 


Sununary  of  Observations 


At  the  provincial  level,  student  achievement  on  Part  A of  the  test  was 
acceptable  in  each  of  the  five  reporting  categories.  More  than  80%  of  the 
students  scored  at  the  Satisfactory  level  or  better  in  Sentence  Structure  and 
Vocabulary  (82.1%  and  81.2%).  Slightly  fewer  than  80%  of  students  scored  at 
the  Satisfactory  level  or  better  in  Content,  Development,  and  Conventions 
(79.5%,  78.4%,  and  77.7%). 

The  provincial  average  for  the  50  multiple-choice  questions  on  Part  B:  Reading 
was  33.5  (67.0%).  The  expected  performance  for  Part  B was  that  80%  of  the 
students  would  score  23/50  (46%)  or  higher.  Student  performance  exceeded  that 
expectation  since  88.7%  of  the  students  scored  23/50  (46%)  or  higher. 

On  the  11  MACOSA  questions,  a greater  percentage  of  students  chose  correct 
answers  in  1984  than  did  Grade  6 students  in  1978.  The  average  score  was 
68.6%  as  compared  to  the  average  1978  score  of  64.4%. 


- 21 


Chapter  6 


GUIDE  TO  THE  INTERPRETATION  OF  JURISDICTION  RESULTS 

In  addition  to  their  use  in  monitoring  student  achievement  for  the  province  as 
a whole,  the  results  of  the  Grade  6 Language  Arts  (English)  Achievement  Test 
are  most  useful  in  comparing  achievement  in  a particular  jurisdiction  with 
provincial  results.  Care  must  be  exercised,  however,  in  making  these 
comparisons  and  in  drawing  conclusions  from  the  data. 

The  following  jurisdiction  and  school  reports  are  provided  for  each 
jurisdiction  under  separate  cover. 

1.  The  Jurisdiction  Summary  Report  - contains  jurisdiction  equivalents 
of  the  provincial  results  in  tables  4,  5,  and  6. 

2.  School  Summary  Reports  - contain  the  school  equivalents  of  the 
provincial  results  in  tables  4,  5,  and  6. 

3 . The  Jurisdiction  Item  Alternative  Response  Frequency  Data  for 
multiple-choice  questions  is  equivalent  to  the  provincial  results  in 
Table  9. 

4.  The  School  Item  Alternative  Response  Frequency  Data  for  the 
multiple-choice  questions  is  equivalent  to  the  provincial  results  in 
Table  9. 

5 . Individual  Student  Subscale  Results 

These  reports  are  confidential  to  the  jurisdiction. 

Differences  Between  Jurisdiction  and  Provincial  Averages 

Jurisdictions  are  provided  with  their  average  scores  for  each  reporting 
category.  These  scores  may  be  compared  to  the  provincial  average  for  the  same 
reporting  category.  However,  the  importance  of  the  differences  between 
jurisdiction  averages  and  provincial  averages  is  not  always  obvious.  To  aid 
in  the  interpretation  of  differences  between  the  averages,  jurisdiction 
reports  indicate  when  the  difference  between  the  jurisdiction  average  and  the 
provincial  average  is  unlikely  to  be  due  to  chance  variation  in  the  abilities 
of  students.  For  the  purposes  of  the  provincial  testing  program,  the  95% 
confidence  interval  is  used.  That  is  to  say,  if  the  probability  is  only  one 
in  20  that  the  difference  is  due  to  chance,  the  jurisdiction  average  is 
considered  different  from  the  provincial  average.  Otherwise,  it  is  classified 
as  not  different  from  the  provincial  average. 

A statistical  test  of  significance  is  made  for  each  reporting  category  for 
each  jurisdiction.  The  provincial  average  for  that  reporting  category  and  the 
provincial  standard  deviation  for  that  category  determine  the  true  population 
average  and  standard  deviation.  The  standard  deviation  of  a distribution  is  a 
measure  of  the  variation  of  scores.  In  a normal  distribution,  there  is  a 
fixed  and  known  relationship  between  the  standard  deviation  and  the  proportion 
of  individual  scores  in  any  part  of  the  distribution.  For  example,  68%  of 
scores  fall  within  one  standard  deviation  of  the  mean  (average).  If  a test 
has  a mean  of  50  and  a standard  deviation  of  10,  68%  of  those  writing  the  test 
scored  between  40  and  60. 


- 22 


The  amount  of  chance  variation  in  jurisdiction  averages  varies  with  the  number 
of  students  tested  in  that  jurisdiction.  When  any  random  sample  is  drawn  from 
a population,  its  average  is  expected  to  be  the  same  as  the  population 
average.  Yet  the  actual  group  average  may  vary  because  of  individual 
variation  in  the  sample.  This  variation  is  known  as  the  error  of  the  mean. 

The  amount  of  variation  in  the  averages  of  random  samples  drawn  from  the 
population  is  related  to  the  standard  deviation  of  the  scores  in  the 
population.  When  the  population  mean  and  standard  deviation  are  known,  as  in 
the  case  of  the  achievement  tests,  it  is  possible  to  determine  how  likely  it 
is  that  any  subgroup,  such  as  a jurisdiction,  represents  a random  sample  of 
the  population  in  achievement.  This  statistical  test,  known  as  a one-sample 
z-test,  is  the  one  that  has  been  applied  to  jurisdiction  scores  in  each 
reporting  category.  Thus  if  a jurisdiction  is  classified  as  different  from 
the  provincial  average,  there  is  less  than  one  chance  in  20  that  the 
difference  between  the  average  score  for  the  jurisdiction  on  that  reporting 
category  and  the  provincial  average  would  occur  in  a group  of  that  size 
selected  at  random  from  all  students  in  the  province.  In  other  words,  the 
difference  is  statistically  significant  at  the  0.05  level. 

Because  these  achievement  levels  are  calculated  taking  jurisdiction  size  into 
consideration,  two  jurisdictions  with  the  same  averages  but  of  different  sizes 
may  be  classified  differently.  The  larger  jurisdiction  would  be  more  likely 
to  be  above  or  below  average,  because  the  amount  of  chance  variation  would  be 
less  in  larger  jurisdictions,  and  the  actual  difference  would  represent  a 
larger  variation  from  the  provincial  average. 

For  example,  imagine  two  jurisdictions,  A with  25  students  writing  Test  X,  and 
B with  100  students  writing  Test  X.  Both  jurisdictions  have  the  same  average, 
54.2.  Test  X has  a provincial  average  score  of  50.0  and  a standard  deviation 
of  12.0.  The  difference  between  the  provincial  average  and  the  jurisdiction 
average  is  4.2.  A difference  this  large  would  be  expected  8 times  out  of  100 
for  groups  of  25  selected  at  random  from  the  population,  and  fewer  than  3 
times  out  of  1000  for  groups  of  100.  Thus  the  difference  from  the  provincial 
average  would  not  be  statistically  significant  for  Jurisdiction  A,  but  would 
be  for  Jurisdiction  B. 

When  it  has  been  determined  that  a difference  is  significant,  the  direction  of 
the  difference  is  important,  particularly  for  those  jurisdictions  below  the 
provincial  average.  These  jurisdictions  are  encouraged  to  identify  the 
sources  of  these  differences. 

Table  10  indicates  the  percentage  of  jurisdictions  classified  as  significantly 
above  or  below  the  provincial  average  for  each  reporting  category. 


- 23 


Table  10 

Distribution  of  Jurisdiction  Levels  of 
Achievement  on  Part  B:  Receptive  Language  - Reading 


Reporting  % Below  the 

Category  Provincial  Average 

% Not  Different  From 
Provincial  Average 

% Above  the 
Provincial  Average 

Total 

15.9 

66.9 

17.2 

Main  Idea 

20.7 

66.9 

12.4 

Supporting 

15.2 

66.9 

17.9 

Detail 

Vocabulary 

17.2 

71.0 

11.7 

Relationships 

12.4 

71.0 

16.6 

Conclusions 

22.1 

64.1 

13.8 

Literal 

12.4 

73.1 

14.5 

Inferential 

15.9 

67.6 

16.6 

Evaluation 

22.8 

66.2 

11.0 

A test  score  does  not  indicate  why  a particular  performance  occurred,  only 
that  it  did  occur.  Identification  of  reasons  for  that  performance  should  be 
undertaken  once  results  have  been  studied.  There  are  a variety  of  factors 
that  should  be  examined: 

1.  Student  motivation.  Consideration  should  be  given  to  the  degree  to 
which  students  were  motivated  to  perform  to  their  levels  of  ability. 

2.  Student  ability.  While  the  notion  of  a target  region  is  designed  to 
take  into  consideration  year-to-year  fluctuations  in  the  average 
ability  levels  of  students,  it  is  possible  that  a group  of  students 
with  a particularly  high  or  low  average  ability  may  come  through  a 
system.  This  is  much  more  likely  to  be  a factor  in  small  systems 
than  in  large  ones. 

3.  Teaching  and  curriculum.  Consideration  should  be  given  to  the  type 
of  instruction  students  have  received  in  the  jurisdiction  and  the 
adequacy  of  curricular  implementation. 


- 24 


There  will  be  other  factors  that  are  of  importance  in  particular 
jurisdictions.  School  boards  wishing  to  examine  further  the  results  in  the 
light  of  local  factors  are  encouraged  to  establish  their  own  local 
interpretation  panels. 

Absentee  Rates 


If  more  than  10%  of  the  eligible  students  in  a jurisdiction  did  not  write  the 
test,  the  reported  average  for  that  jurisdiction  may  not  accurately  represent 
the  jurisdiction  averages.  Teacher-assigned  marks  for  students  who  did  not 
write  could  be  compared  with  teacher-assigned  marks  for  students  who  did 
write.  If  the  averages  are  the  same  for  the  two  groups,  the  reported 
jurisdiction  averages  are  probably  representative.  If  the  averages  are 
different,  some  estimates  can  be  made  of  what  the  jurisdiction  averages  might 
have  been  if  all  students  had  written  the  test.  Jurisdictions  with  higher 
absentee  rates  may  wish  to  contact  the  Student  Evaluation  Branch  for 
assistance  in  estimating  their  jurisdiction  averages. 


- 25  - 


Appendix  A 


GRADE  6 SCORING  GUIDES:  EXPRESSIVE  LANGUAGE  - WRITING 

Reporting  Category:  CONTENT 

(Selecting  Details  Appropriate  to  Purpose*) 


SCORE 

DESCRIPTION  OF  PERFORMANCE 

5 EXCEPTIONAL 

Events  are  plausible  v/ithin  the  context 
established  by  the  v/riter.  Events  are 
consistent  with  atmosphere  and  character 
motivation.  Specific  details  describe 
characters  physically,  and  clearly 
suggest  or  imply  their  motives.  Details 
describing  the  setting  create  atmosphere. 

4 PROFICIENT 

Events  are  plausible  within  the  context 
established  by  the  writer.  Events  are 
connected  to  character  motivation. 
Appropriate  details  describe  characters 
and  hint  at  their  motives.  Details 
describing  the  setting  create,  but  do  not 
sustain,  atmosphere. 

3 SATISFACTORY 

Most  events  are  plausible  although 
credibility  may  falter  OR  events  are 
conventional  and  predictable. 
Appropriate  details  present  a physical 
description  of  characters  and  setting. 

2 LIMITED 

Some  events  are  plausible,  but  others 
lack  credibility.  Fev;  appropriate 
details  describe  characters  or  setting. 

1 POOR 

Events  are  implausible.  No  appropriate 
details  describe  character  and  setting. 

0 INSUFFICIENT 

Too  little  writing  exists  for  a judgment 
to  be  formed. 

*Details  selected  by  the  student  will  be  either  descriptive  details  associated 
with  character  or  setting,  or  narrative  details  associated  with  actions  or 
events . 


- 26 


Reporting  Category:  DEVELOPMENT 

(Organizing  Details  into  a Coherent  Whole) 


SCORE 

DESCRIPTION  OF  PERFORMANCE 

5 

EXCEPT I OMAL 

Events  are  placed  in  a coherent  sequence  and 
are  ordered  for  effect.  Details  describing 
character  and  setting  are  skilfully  unified 
with  the  story's  action.  The  story's  ending 
conveys  an  appropriate  sense  of  closure. 

4 

PROFICIEMT 

Events  are  placed  in  a coherent  sequence. 
Details  describing  character  and  setting  are 
united  with  the  story's  action.  A sense  of 
closure  is  achieved. 

3 

SATISFACTORY 

Events  are  placed  in  a generally  coherent 
sequence.  Details  describing  character  and 
setting  seem  to  be  included  as  afterthoughts, 
but  never  act  as  a disorganizing  influence. 
Closure  is  attempted. 

2 

LIMITED 

A sequencing  of  events  can  be  detected,  but 
coherence  is  not  achieved.  Details  describing 
character  and/or  setting  are  not  united  with 
the  story's  action.  A sense  of  closure  is 
absent;  OR  if  closure  is  attempted,  it  is 
inappropriate . 

1 

POOR 

No  sequencing  of  events  is  discernible.  A 
sense  of  closure  is  absent. 

0 

INSUFFICIENT 

Too  little  writing  exists  for  a judgment  to  be 
formed.  Writing  that  has  been  awarded  a 0 for 
Content  is  insufficient. 

27 


Fteporting  Category:  SENTENCE  STRUCTURE 

(Structuring  Sentences  Effectively) 


SCORE 

DESCRIPTION  OF  PERFORMANCE* 

5 

EXCEPTIONAL 

Variety  of  sentence  type,  length,  and  structure 
is  used  for  effects  such  as  emphasis. 
Co-ordination  has  been  controlled,  and 
subordination  is  used  appropriately.  Sentence 
fragments,  if  present,  are  used  for  effect. 

4 

PROFICIENT 

Variety  of  sentence  type,  length,  and  structure 
is  evident.  Co-ordination  is  seldom  overused, 
and  subordination  has  been  successfully 
attempted.  There  are  few  inadvertent  sentence 
fragments  and/or  run-on  sentences. 

3 

SATISFACTORY 

Sentences  show  some  variety  in  type,  length, 
and  structure  although  co-ordination  may  be 
overused.  Subordination  may  be  present. 
Inadvertent  sentence  fragaments  and/or  run-on 
sentences  are  in  evidence,  but  do  not  impede 
meaning. 

2 

LIMITED 

Sentences  show  little  variety  and  many  are 
awkwardly  structured.  An  overdependence  on 
co-ordination  is  demonstrated.  Subordination, 
if  used,  is  inappropriate.  Inadvertent 
sentence  fragments  and/or  run-on  sentences  are 
frequent  and  impede  meaning. 

1 

POOR 

Sentences  are  immature  and  repetitious  in  their 
patterns.  Co-ordination  has  been  used  almost 
exclusively.  Inadvertent  sentence  fragments 
and/or  run-on  sentences  are  common  and  impede 
meaning. 

0 

INSUFFICIENT 

Too  little  writing  exists  for  a judgment  to  be 
formed.  Writing  that  has  been  awarded  a 0 for 
Content  is  insufficient. 

*Some  descriptions  of  performance  have  been  adapted  from  C.  R.  Cooper, 
"Holistic  Evaluation  of  Writing"  in  Cooper  and  Odell,  Evaluating  Viriting  — 
Describing , Measuring,  Judging.  National  Council  of  Teachers  of  English, 


1977. 


28 


Reporting  Category:  VOCABULARY 

(Selecting  and  Using  Words  and  Expressions  Correctly  and  Effectively) 


SCORE 

DESCRIPTION  OF  PERFORMANCE 

5 

EXCEPTIONAL 

Specific,  concrete  words  predominate  and  have 
been  selected  to  create  vivid  images  or  precise 
details.  Denotative  meanings  are  accurate  and 
effective.  Words  are  frequently  used  for 
connotative  effect. 

4 

PROFICIENT 

Frequent  use  of  specific  or  concrete  words  adds 
clarity  to  the  detail  created.  Denotative 
meanings  are  accurate  and  effective.  Words  are 
sometimes  used  for  connotative  effect. 

3 

SATISFACTORY 

General  or  abstract  words  are  often  used  where 
specific  or  concrete  words  would  have  been  more 
effective.  Denotations  are  accurate.  Words 
are  seldom  used  for  connotative  effect. 

2 

LIMITED 

General  words  are  usually  used  where  some 
specific  words  would  have  been  more  effective. 
Denotative  meanings  are  generally  accurate,  but 
choice  of  words  is  limited. 

1 

POOR 

Words  convey  only  vague  or  general  meanings. 
Denotative  meanings  are  sometimes  inaccurate, 
and  choice  of  words  is  restricted. 

0 

INSUFFICIENT 

Too  little  writing  exists  for  a judgment  to  be 
formed.  Writing  that  has  been  awarded  a 0 for 
Content  is  insufficient. 

29 


Reporting  Category:  CONVENTIONS 

(Using  the  Conventions  of  Language  Correctly  and  Effectively) 


SCORE 

DESCRIPTION  OF  PERFORMANCE 

5 

EXCEPTIONAL 

The  communicative  power  of  the  composition  is 
enhanced  because  of  careful  capitalization, 
spelling,  punctuation,  and  grammar. 

4 

PROFICIENT 

Communication  is  clear  because  of  essentially 
correct  capitalization,  spelling,  punctuation, 
and  grammar. 

3 

SATISFACTORY 

Communication  is  adequate  because  of  generally 
correct  capitalization,  spelling,  punctuation, 
and  grammar. 

2 

LIMITED 

Communicative  power  is  reduced  because  of 
incorrect  capitalization,  spelling, 
punctuation,  and  grammar. 

1 

POOR 

The  communicative  power  of  the  composition  is 
very  weak  because  of  incorrect  spelling, 
grammar,  punctuation,  and  capitalization. 

0 

INSUFFICIENT 

Too  little  writing  exists  for  a judgment  to  be 
formed.  Writing  that  has  been  awarded  a 0 for 
Content  is  insufficient. 

30 


APPENDIX  B 


SAMPLE  SCORE  SHEET 
Grade  6 Language  Arts  (English) 
Achievement  Test 


Marker  ID  number: 


CONTENT 

0 

1 

2 

3 

4 

5 

DEVELOPMENT 

0 

L 

2 

3 

4 

5 

SENTENCE 

0 

1 

2 

3 

4 

5 

STRUCTURE 

VOCABULARY 

0 

1 

2 

3 

4 

5 

CONVENTIONS 

0 

1 

2 

3 

4 

5 

31 


DATE  DUE  SLIP 

^ i'S  0€ 

OCT  1 4 KtIURN 

mt  orn 

^ 1 

1393S^Fi6  RETUbti 

^'/e  3 0 '94 

«„  mnm 

-W-'  ■ ■ ' ' 

■'  ■ 

1 

! 

I 

1 

. 1 

-i 

I 

— 1 

i 

F255 

(S> 

f 


JUN  1 2 1985 
fiCiJSO  1987 


LB  3054  C2  D3B3H  1984 
PROVINCIAL  REPORT  ACHIEVEMENT 
TEST  6RA0E  G — 

SERIAL  Ml  39904052  EOUC 


-00003HB91M28- 


B49313 


